Nucleotide binding as an allosteric regulatory mechanism for Akkermansia muciniphila β-N-acetylhexosaminidase Am2136

ABSTRACT β-N-acetylhexosaminidases (EC3.2.1.52), which belong to the glycosyl hydrolase family GH20, are important enzymes for oligosaccharides modification. Numerous microbial β-N-acetylhexosaminidases have been investigated for applications in biology, biomedicine and biotechnology. Akkermansia muciniphila is an anaerobic intestinal commensal bacterium which possesses specific β-N-acetylhexosaminidases for gut mucosal layer colonization and mucin degradation. In this study, we assessed the in vitro mucin glycan cleavage activity of the A. muciniphila β-N-acetylhexosaminidase Am2136 and demonstrated its ability that hydrolyzing the β-linkages joining N-acetylglucosamine to a wide variety of aglycone residues, which indicated that Am2136 may be a generalist β-N-acetylhexosaminidase. Structural and enzyme activity assay experiments allowed us to probe the essential function of the inter-domain interactions in β23-β33. Importantly, we revealed that the hydrolysis activity of Am2136 was enhanced by nucleotides. We further speculated that this activation mechanism might be associated with the conformational motions between domain III and IV. To our knowledge, this is the first report of nucleotide effector regulated β-N-acetylhexosaminidase, to reveal its novel biological functions. These findings contribute to understanding the distinct properties within the GH20 family and lay a certain foundation to develop controllable glycan hydrolyzing catalysts. Abbreviations: OD600 - optical cell densities at 600 nm; LB - Luria–Bertani; IPTG - isopropyl β-D-1-thiogalactopyranoside; PMSF - phenylmethanesulfonyl fluoride; rmsd - root mean square deviation; GlcNAc - N-acetyl-β-D-glucosamine; GalNAc - N-acetyl-β-D-galactosamine; Gal - galactose


Introduction
Oligosaccharide contains different sugar units, due to its wide variety of biological activities, it has potential commercial values in food, pharmaceutical and cosmetic industries. 1 β-N-acetylhexosaminidase is one of the most abundant glycosidases and is specific for the hydrolysis of both β-GlcNAc and β-N-acetylgalactosamine (β-GalNAc) units from the non-reducing end of glycan chains. 2 Because of the ability to degrade a wide variety of substrates, and the wide resources including bacteria, fungi and arthropods, 3,4 various therapeutic and biotechnological applications have been proposed for β-N-acetylhexosaminidase, which include conversion of industrial-scale production of distinct functional carbohydrates and glycan derivatives. 5 Christoph Mayer and Vladimír Křen et al. have proposed a classical catalysis mechanism of β-N-acetylhexosaminidase: the "doubledisplacement mechanism" requires a nucleophile amino acid Asp/Glu to attack the substrate to form an enzyme-substrate complex intermediate, then a water would subsequently attack the intermediate and transfers a proton to the acid/base. 3 The "substrate-assisted mechanism" is also a kind of "double displacement mechanism", while they only differ from their intermediates. The HexNAc from the substrate acts as a nucleophile to attack the Asp/Glu and forms an oxazolinium ion intermediate, catalytic Asp/Glu residue provides protons to activate the water and finally generate the reaction product. 6 Based on the distinct mechanisms, structures and functions, the known exo-β-N-acetylhexosaminidases have been grouped into several families including GH3, GH5, GH20, GH84, GH109 and GH116, although the oxidative mechanism of GH109 is still missing. 7 The GH20 family has the largest number of β-N-acetylhexosaminidases, and they generally utilize the substrate-assisted mechanism. 8 In GH20, the enzymes from eukaryotes are dimer and each monomer has two subunits, 9 while that from prokaryotes possess diverse structures. Increasing structural studies keep expanding our understanding of correlations between the function and structure in glycosidase. The enzyme SpHex from Streptomyces plicatus only has two conserved subunits, 10 but the enzyme GcnA from Streptococcus gordonii 11 and SmGH20A from Serratia marcescens 12 have three and four subunits, respectively. The different structures confer the enzymes various functions, some of them even participate in the synthesis of oligosaccharides. 13 GH20 enzyme with β-N-acetylhexosaminidase activity have a wide range of cellular functions across different organisms, ranging from cell growth control, to host-pathogen interactions. For instance, gut microbes have to release specific mucin-degrading enzymes such as β-N-acetylhexosaminidases to regulate intestinal microecology. 14 Akkermansia muciniphila is a famous mucin-degrading intestinal commensal bacterium, and extensive studies have revealed that A. muciniphila has many clinical benefits. 15,16 The outstanding mucin-degrading capability of A. muciniphila is mainly caused by its wide variety and amounts of glycosidases. 17 There are more than 3% genes encoding β-N-acetylhexosaminidases in A. muciniphila genome, 18 and although a few of them have been reported, [19][20][21] the detail of their distinct structural and biochemical properties still needs further studies. Recently, a novel GH20 glycosidase Am2136 characterized by its additional domain has been reported. 20 However, the detail of the function and mechanism is not yet clear. In this article, we gained the unique crystal structure of Am2136, and revealed the ability of Am2136 to act on a wide range of different oligosaccharides. Importantly, we found that the catalytic activity of Am2136 can be regulated by nucleotides and made a certain explanation for the mechanism based on the structure and biochemical experiment data. In general, our results indicated the potential role of Am2136 in the metabolism of sugar nucleotide and laid a foundation for further development of novel GlcNAcase with fine-tuned activities which can be controlled by varying the concentration of allosteric modulators.

Bacterial growth conditions and gene cloning
A. muciniphila (ATCC BAA-835) was cultured under anaerobic conditions at 37°C in Brain Heart Infusion (BHI) media supplemented with 0.05% (w/v) hog gastric mucin type III (Sigma Aldrich). Growth was measured by spectrophotometer as optical density at 600 nm (OD600) (NanoDrop 2000, Thermo Fisher).
The coding region (residue Q20-L756) of Am2136 without signal peptide was based on UniProt 22 and obtained through PCR on Gene Amplification Machine (Gene Touch, BIOER, HangZhou, China). The coding gene (Amuc_2136) was cloned into pET-22b with a 6XHis-tag labeled in the N terminal by using ClonExpress II One Step Cloning Kit (Vazyme) to generate pET22b-His-Amuc_2136. Based on this template, the mutants were constructed by using the QuickChange (High Fidelity Master Mix, MCLAB) PCR-based method. All the recombinant plasmids were further verified by DNA sequencing.

Protein expression and purification
The recombinant plasmid was transformed into E. coli strain BL21 (DE3) for expression. A single colony was used to inoculate 50 ml of Luria-Bertani (LB) medium containing 30 μg/ml Ampicillin, and shook at 37°C overnight. Then, 5 ml portions were used to inoculate four 2 L flasks, each of which contained 1 L of the same medium. The cultures were allowed to grow with moderate shaking (220 rpm) at 37°C to an optical density (OD) at 600 nm of 0.8 and then cooled down to 16°C. The expression was induced by adding isopropyl β-D-thiogalactopyranoside (IPTG) to 0.05 mM and the culture was shaking for an additional 12-14 h at 16°C.
As for the expression of selenium-labeled Am2136, the recombinant plasmid was transformed into E. coli strain B834 (DE3). A single colony was still cultivated in the LB medium and shook at 37°C overnight. The LB medium would be removed by centrifugation at 4000 rpm for 15 min at 4°C, and then the bacteria were cultured in the M9 medium supplemented with various amino acids and vitamins at 37°C until its OD 600 reaching 0.8. Finally, 0.05 mM IPTG was added and induced for 16 h at 16°C.
All the cells were harvested by centrifugation at 4000 rpm for 15 min at 4°C and suspended with lysis buffer containing 25 mM Tris-HCl (pH 8.0), 150 mM NaCl, 5% glycerol and 1 mM phenylmethanesulfonyl fluoride (PMSF) and then lysed by sonication. The lysate was cleared by centrifugation at 17,000 g for 30 min at 4°C, and the supernatant was incubated with the Ni-NTA resin (Sigma Aldrich) for 60 min at 4°C. The Ni-NTA column was washed with lysis buffer and then eluted with elution buffer (25 mM Tris-HCl pH 8.0, 150 mM NaCl, 5% glycerol, 350 mM imidazole). The eluted protein solution was concentrated to 1 ml by using Centricon filter (50 kDa cutoff; Millipore) and exchanged into gel filtration buffer (25 mM Tris-HCl pH 8.0, 150 mM NaCl, 5% glycerol) using the column Superdex 200 Increase 10/ 300 GL (GE Healthcare) at 0.5 ml/min at 4°C. The protein was snap frozen in liquid nitrogen and stored at −80°C. The protein used for MS analysis was purified with volatile buffer (150 mM NH 4 Ac, pH 8.0), and the other process remains unchanged.

Crystallization, structure determination and refinement
The protein samples were concentrated to 6.5 mg/ ml for crystallization experiments. Initial crystallization experiments were carried out in 96-well Mosquito (TTP LabTech Ltd.) plates with commercially available crystallization screens from Hampton Research and Rigaku (Index HT, Crystal Screen HT, WIZARD HT, Xtal Quest HT, Salt RX HT, PEG RX HT). Crystallization was performed at 293 K by the hanging-drop vapor diffusion method, the 200 nL mixing drop containing protein solution and reservoir buffer in 1:1 ratio. The final optimized crystals were obtained from 0.2 M Li 2 SO 4 , 25% w/v PEG3350, 0.1 M HEPES, pH 7.5. The crystals were transferred to the reservoir solution plus 20% Glycerol and flash-cooled in liquid nitrogen. Diffraction data were collected with a CCD camera on BL-18 U stations in Shanghai Synchrotron Radiation Facility (Shanghai, China). Data processing and scaling were carried out using the HKL2000 software package. 23 The data of selenium-labeled Am2136 was processed to a resolution limit of 2.9 Å in space group P12(1)1, with unit-cell parameters of a = 96.20 Å, b = 119.51 Å and c = 161.93 Å. The diffraction data were processed with HKL2000, and the phases were calculated using AutoSol. 24 The structure refinement was performed using COOT 25 and PHENIX, 26 and the structural representations were made through PyMOL. 27

Molecular docking
The solved Am2136 apo structure was first processed with the use of PyMOL, and docking was performed with AutoDock 4.2.6. 28 Additional molecules were deleted and the protonation state of the structure was adjusted for neutral pH. The grid box of 40 × 40 × 40 points was used with a spacing 0.375 Å, and the grid box center was put on x = −20.101, y = 18.175, and z = −58.633. Gasteiger charges were assigned to protein and ligand molecules, and the molecule center is about 43.151, −13.360, 79.262. Exhaustiveness was set on 200 and a computer with eight processors was utilized for the computation. Hundred poses were generated for GDP and preparation of the image representing the best pose was done with PyMOL. An additional blind docking was also performed with the use of the Swissdock web server (http:// swissdock.vital-it.ch/). 29

Enzyme assays of Am2136 wild-type and mutants
The activity was determined by measuring the change of 405 nm absorption of the liberated 4-nitrophenolate, and there are 51 time points measured within 5 min on Cytation3 (BioTek). The measurement was conducted under 37°C using p-nitrophenyl (pNP)-β-GlcNAc (Sigma) as a substrate. The 50 μL reaction mixture contains 0.125-4 mM substrate, 25 mM Tris pH 8.0, 150 mM NaCl, and the reaction was started by adding proteins to 2 μM final concentration. In order to study the influence of the nucleotides on the enzymatic activities, 1 mM NDP or NTP were introduced to the reaction system. All experiments were repeated three times.

The metal ion effect on enzyme hydrolysis activity
The purified protein was added with the final concentration of 5 mM EDTA and incubated on ice for 20 min. Then the protein was dialyzed with the buffer solution (2 mM EDTA, 25 mM Tris, 150 mM NaCl, 5% Glycerol, pH 8.0) at 4°C for 4 h at a volume ratio of 1:350. After that, the protein was dialyzed with the EDTA-free buffer for 4 h and repeated twice.

The nucleotide binding affinity
The nucleotide binding ability was determined on Fluorescence Spectrometer Duetta (HORIBA) by measuring the change of peak intensity. The fluorescence was excited at 286 nm and emitted at 336 nm. And the measurement was processed under 400 V voltage with 2.5 nm excitation slit and 5 nm emission slit. The 120 μL reaction mixture contains 0-550 μM nucleotide, 5 μM proteins, 25 mM Tris pH 8.0 and 150 mM NaCl. The sample was incubated at 4°C for 5 min before testing.

The capability of mucin hydrolysis
The activity of mucin hydrolysis of recombinant Am2136 was first probed with thin-layer chromatography (TLC). The 100 μL reaction mixture contains 10 mg/ml hog gastric mucin (type III, Sigma), 100 μM proteins, 150 mM NH 4 Ac pH 8.0. After reacting at 37°C for about 14 h, the samples were centrifuged at 12,000 rpm for 3 min at room temperature. The supernatant was spotted on a silica gel plate and allowed to air dry. The silica gel plate was then placed in a chamber containing a running buffer of 1-butanol, absolute ethanol and ddH 2 O at a ratio of 5:3:2. The silica gel plate was dried and then visualized using a buffer containing 2% acetone diphenylamine, 2% acetone aniline and 85% phosphoric acid at a ratio of 5:5:1 with heating at 110°C for 15 min.
The samples were heated at 100°C for 5 min and centrifuged at 15,000 rpm for 30 min before further separated by high-performance liquid chromatography-evaporative light scattering detector (HPLC-ELSD). HPLC (UltiMate 3000, Thermo Fisher) was carried out on WELCH Ultimate® XB-NH2 column (4.6 × 250 mm, 5 μm) under 30°C. The mobile phase was 70% acetonitrile (ACN) and eluted at a flow rate of 1 ml/min, and the injection volume was 2 μL. The effluent was monitored by ELSD detector (Alltech 2000ES) with 90°C drift tube temperature. The nitrogen flow rate is 2 L/ min, and the value of gain is 1.
The repeated samples were separated without detecting and then collected followed by the analysis of ultrafleXtreme MALDI-TOF/TOF MS (Bruker Daltonics). All spectra were obtained in reflectron mode with an acceleration voltage of 24.59 kV, a reflector voltage of 26.6 kV, and a pulsed ion extraction of 100 ns in the positive ion mode. TOF/TOF was used to measure the fragmented glycan ions. Precursor ions were accelerated to 7.38 kV and selected with a timed ion gate. Fragment ions generated by the laser-induced decomposition of the precursor were further accelerated to 18.98 kV in the LIFT cell. Finally, the data would be analyzed by FlexAnalysis 3.3, and the glycan structures were annotated by GlycoWorkbench 2 with a signal-noise ratio more than 6.

The capability of various oligosaccharides hydrolysis
The 30 μL reaction mixture contains 1.2 mM oligosaccharides (Sigma), 4 μM proteins, 150 mM NH 4 Ac pH 8.0. The qualitative and quantitative analyses were performed by high-performance liquid chromatography-tandem triple quadrupole mass spectrometry (HPLC-QqQ-MS) (Agilent 1260 HPLC system and 6460 QqQ system, Agilent Technologies, USA). The separation was performed on a Chromplus C 18 column (4.6 × 250 mm, 5 μm) at 25°C. The samples were isocratic eluted with 0.1% formic acid in water and acetonitrile (80:20, v/v) at a flow rate of 0.5 mL/min with an injection volume of 5 μL. The mass spectrometer was operated in the electron spray ionization mode with drying gas temperature of 300°C, N 2 gas flow at 11 L/min, nebulizer pressure of 15 psi, and capillary voltage of 4000 V.
Qualitative analysis was carried out by MS2 fullscan mode, while quantitative analysis was performed by multiple reaction monitoring (MRM) in positive ionization mode. Amygdalin (IS) was used as internal standard, and the sample was prepared by adding 20 μL internal standard (116.96 μM) and 170 μL water to 10 μL reaction mixture. The ion transitions were chosen as m/z 222.0→203.9 for GlcNAc and m/z 458.1→296.0 for IS. The fragmentor voltage values of GlcNAc and IS were set to 70 and 130 V, respectively. The collision energies of GlcNAc and IS were set to 1 and 10 eV, respectively. The signal acquisition and peak integration were performed using the MassHunter Qualitative Analysis Software (Agilent, USA). The concentration of GlcNAc in each sample was determined by using the established calibration curve (y = 0.768x + 2.4873) made from the standard compound.

The mucin cleavage capacity of Am2136
The gastric mucin secreted by gut epithelial is a kind of high molecular-weight glycoprotein, and the main compositions of the glycan chain are N-acetyl-D-glucosamine (GlcNAc), N-acetyl-D-galactosamine (GalNAc), galactose (Gal), fucose and sialic acid. 30 Various monosaccharides derived from cleaved mucin glycans are not only the energy source for gut microbes but also important substrates for functional short chain fatty acids (SCFA) synthesis. 15 A. muciniphila has evolved out distinct glycosidases to consume the mucin glycans. 31 As the major group of GH20 family glycosidase, β-N-acetylhexosaminidases are responsible for the efficient mucin degrading capability of A. muciniphila and show immense structural and functional diversity. Previous reports have indicated that β-N-acetylhexosaminidase Am2136 could act on both pNP-β-GlcNAc and pNP-β-GalNAc, but with more preference towards pNP-β-GlcNAc. 20 In order to verify the mucin hydrolysis ability and glycosidic linkage specificity of the substrate, we incubated the hog gastric mucin with Am2136 under 37°C for 14 h, then analyzed the reactant with thin-layer chromatography (TLC). The standards of Mannose and Glucose were used to test the reaction system ( Fig  S1A). Compared with Fig S1B, a ladder of vertical migration in Fig S1C indicated that carbohydrates were generated from the Am2136 treated groups. This is inconsistent with the HPLC-ELSD analysis as the extra peaks (retention time 7.5-9 min) were only observed in Am2136 treated mucin (Fig S1D-F). As the extra peaks (retention time 7.5-9 min) were composed of various oligosaccharides (Fig S1G), we deduced that Am2136 is active on the mucins.
Combining from the previous studies on the glycosyl composition of mucin, 30 we incubated Am2136 with various oligosaccharides including GlcNAc-(β1,6)Gal-(β1,4)Glc, GlcNAc-(β1,6)GalNAc-α-pNP, GlcNAc-(β1,2)Man and Methyl GlcNAc-(β1,3)Gal and analyzed the products with QqQ-MS. The structure information of these carbohydrates and Amygdalin (IS) is present in Fig S2A-F. Fig S3 showed the MS spectrum of GlcNAc and oligosaccharide substrate before reaction. After treated with Am2136 for 1 h under 37°C, the extra peak of GlcNAc appeared in Figure 1a-d indicated that Am2136 could hydrolyze the glycosidic bond which is next to the 2-acetamido groups in all the tested substrates which containing the non-reducing βlinkages joining N-acetylglucosamine. Generally, substrate with D-gluco-structures is preferred for most β-N-acetylhexosaminidases, the GlcNAcase activity is 1.5 to 4.0 times higher than GalNAcase. 32 The high GlcNAcase/GalNAcase ratio (~100 times) found in Am2136 20 represents one of the special glycosidases involved in mucin degradation.

Nucleotides have positive effect on Am2136 hydrolysis ability
Among the known GH20 family members, Am2136 is unique for its possession of additional N-terminal domain (D21-G107) and C-terminal galectin-like domain (D601-L756). 20 The galectin-like domain usually acts as a sugar moiety-recognition module. 33 Thus, we evaluated the interactions of Am2136 with laboratory available carbohydrates and nucleotide molecules by fluorometry using the intrinsic fluorescent properties of the protein. Intriguingly, only nucleotide exhibited significant binding to Am2136 (Figure 2). The calculated K D values  indicated that the nucleotide-binding was specific in which the base type and phosphate group might contribute to selectivity (Table 1).
Among the tested nucleotides, GDP showed the relatively highest binding affinity (K D = 116 μM) and thus was firstly selected for clarifying the effects of nucleotide-binding on Am2136 function. The enzymatic activity assay was performed by following the previously described protocol, 20 substrate pNP-β-GlcNAc was used for measuring the released p-nitrophenol after Am2136 cleavage. Various concentrations of GDP (0-2.65 mM) were added into the reaction mixture and the results showed a concentration-dependent enhancement of Am2136 hydrolysis activity (Fig S4A), implicating a positive regulatory role of GDP. Then, 1 mM nucleotides were utilized for further analyzing their effects on kinetic parameters. Similarly, more or less increased Am2136 activities were observed (Table 2), and we could find that nucleotide-binding did not significantly affect the V max values but mainly reduced the K m values (from 1.46 mM to 0.7-0.85 mM). Furthermore, we used various oligosaccharides to confirm the hydrolysis capability of Am2136 and the modulation effects of nucleotide. The 1.2 mM substrate was incubated with 4 μM Am2136 under 37°C, and the product GlcNAc released at different reaction times was quantitatively analyzed by QqQ-MS. Amygdalin (Fig S2F) was used as internal standard (IS) to quantitatively measure the amount of free GlcNAc in the reaction mixture ( Fig S4B). As shown in Figure 3a-d, after incubated with Am2136 for 240 min, there were 549 μM, 308 μM, 671 μM and 448 μM GlcNAc respectively, indicating the possible substrate preference order was: GlcNAc-(β1,2)Man > GlcNAc-(β1,6)Gal-(β1,4)Glc > Methyl GlcNAc-(β1,3) Gal > GlcNAc-(β1,6)GalNAc-α-pNP. In addition, when 1 mM GDP was added in the reaction mixture, the amount of free GlcNAc were increased to 712.55 μM, 451.25 μM, 851.33 μM and 610.53 μM respectively, without changing the substrate preference of Am2136 (Figure 3a-d). Therefore, the increase in Am2136-substrate affinity by the allosteric nucleotide binding may be a common mechanism for different substrates.
Of the limited reports available on allosteric activator of glycosyl hydrolase, only anion effects on mammalian α-Amylase have been well documented, in which the chloride increases the k cat value but does not change the K m . 34 While allostericbinding compounds have been identified or designed for stabilizing human α-GalA (lysosomal enzyme α-galactosidase A). 35,36 Inhibitory allosteric effects of T4 protein spackle on T4 phage gp5 lysozyme have been reported elsewhere. 37 In this work, the maximal activation of Am2136 (up to twofold) was observed at 1 mM of GDP, indicating a high activation profile. Meanwhile, the presence of nucleotide is not obligatory for the catalytic reaction, suggesting that nucleotides act as nonessential activator. The unexpected finding of novel allosteric modulators of Am2136 provide further insight into the glycosidase mechanism.

Structural characterization of Am2136 in apo form
The crystal structure of Am2136 (without signal peptide residues 1-19) in apo form was solved to 2.9 Å ( Table 3). As Figure 4a shown, the catalytic domain III adopts a typical (β/α)8-barrel (TIMbarrel) fold which is structurally conserved in GH18, 20 and 85 GlcNAcases. 10,38,39 Our refined Am2136 structure broadly resembles the previously reported Am2136-GlcNAc complex structure (PDB code:6JQF), with Cα atom root-mean square distance (rmsd) of 0.835 Å, demonstrating rigid structural integrity. However, by overlaying the domain III from these structures, we may find obvious structural variations at the substrate-binding sites. Most crucial residues involved in substrate GlcNAc moiety recognition could be structurally aligned well except residues D412, E413 and Y415 (Figure 4b). In contrast to the GlcNAc-bound structure (PDB code 6JQF), the side-chains of these catalytic residues adopt different orientations and protrude away from the reaction center in our structure. It has been widely recognized that GH20 enzymes catalyze reactions through substrate-assisted mechanism. 8 From Figure 4c we can see that the region V406-Y415 is highly conserved among GH20 glycosidases from different species. Thus, Y415 is a crucial part of the hydrophobic pocket necessary for substrate binding, this interpretation is further supported by the obvious increased K m value (2.32 mM) of mutant Y415A ( Table 4). The residue E413 provides a proton to the aglycone leaving group during the oxazolinium formation step, and then abstract one from the nucleophilic water during the oxazolinium hydrolysis, while D412 stabilizes the oxazolinium intermediate. Therefore, the ala-substitutions on these sites abolished the hydrolysis activity of Am2136 The catalytic domain III was flanked by domain I, II and IV, in which the majority of interdomain contacts were established by the galectin-like domain IV (1099.1 Å 2 ). This galectin-like module adopts a closing hand shape formed by two antiparallel β-sheet sandwiches (β23-β33). It forms   Figure 4. The detail of the Am2136 overall structure apo form. a. The Am2136 monomer structure is shown in cartoon, and the four subunits (domain I-domain IV) are colored in pink, yellow, green and cyan, respectively. b. The structural comparison of Am2136 apo form (green) and Am2136-GlcNAc complex (cherry). The crucial residues are shown in sticks, and GlcNAc is colored in yellow. c. The amino acid sequence alignments of essential residues of different β-N-acetylhexosaminidases in GH20. The crucial region is marked with blue box, the completely conserved residues are colored in white and shade in red, and the conserved residues are colored in red. The core residues D412 and E413 are indicated with asterisk. several hydrogen bonds (R665-G546/Y547, R730-T545) and a charge-charge interaction (K714-D717) with domain III (Figure 5a), by which the extended V-type loop (V707-T732) can participate in the substrate-binding pocket formation (Figure 5b). Based on that, we generated T545A/ G546A and ΔD717-K727 mutants in high homogeneity ( Fig S5) to see their effects on kinetic properties. As expected, the K m value of mutant T545A +G546A was more than twice as much as that of wild type, and the apparent V max value was reduced to 264 μmol min −1 (Table 4). Similar but more attenuated effects were observed in ΔD717-K727 mutant. Meanwhile, various hydrogen bonds are also formed by residues from domain I, II and III (Figure 5c), among which residue D57 has been proposed to be part of the functional metalbinding region (D57, E59, Y232, Q233 and S523). 29 However, in contrast with the mutations that disrupting domain III-IV interactions, the triple mutant D57A+E59A+Q60A had no obvious effects on the catalytic efficiency of Am2136 (Table 4). Hence, these results demonstrate that the specific interactions between domain III and IV are an important integral part of regulating the hydrolysis activity.
Up to now, the structures of over 26 GH20 family members have been solved. The conserved catalytic domain is commonly accompanied by several domains associated with diverse functions. Although A. actinomycetemcomitans dispersin B contains only a single GH20 domain, 40 the minimal functional unit of most GH20 enzymes requires a non-catalytic domain named GH20b. GH20b has accessory but necessary roles in ensuring stable expression. 41 Am2136 consists of four domains, in which the central II and III domains correspond to GH20b and GH20 domains separately. The N-terminal immunoglobulin-like domain I is a distinct module and is proposed to be essential for Am2136 stability, because the 21-107 truncated version is not stable enough. 20 This  seems to indicate that this unique domain I serves a similar function as GH20b in promoting protein folding. The domain IV bears structural resemblance to certain galectin, 42 its potent substratebinding property was taken into account, but incubation of the protein with carbohydrates did not result in any significant binding effects (results not shown). Nevertheless, whether or not its β sheet sandwich architecture could serve as a carbohydrate-binding module still needs further analysis. More specifically, structural analysis indicates that domain IV consists of an extended solvent-exposed loop to cover the side of catalytic domain. We show that disrupting the noncovalent association of domain IV with the catalytic domain significantly reduces the activity of Am2136 in kinetic analyses, and this effect could be attributed to the close spatial and functional association between the extended loop of domain IV and the edge of the substrate-binding pocket.

Structural basis for the nucleotide recognition of Am2136
The phenomenological description of the linkage between nucleotide binding and hydrolysis activity of Am2136 encourages us to further explore its molecular mechanism. Although attempts have been made to crystallize nucleotide bound to Am2136, the complex structure has not been obtained. Therefore, we utilized the online prediction and docking to investigate the structural details. Firstly, the apo structure of Am2136 was * The score means the sequence evolutionary conservation value of residues. * Points with high predicted ligand ability are clustered and ranked according to a ranking function based on cumulative score of the cluster. submitted to binding site prediction server PrankWeb. 43 Four possible ligand-binding pockets were identified, ranked as pocket 1-4 (Table 5 and Figure 6a). Pocket 1 with the highest score and is actually the substrate-binding site. While pocket 4 located on the top of β4 and β12 has the lowest probability, hence we excluded this site from the following docking experiments. The ligand GDP was docked with Am2136 apo structure using the AutoDock suite of tools. The center sites of predicted pocket 1-3 were used as grid centers and the docking parameters were set to 200 runs per protein-ligand complex. The clustering was performed at 2.0 Å r.m.s. to validate the convergence to the best pose, and the best pose was defined as the conformation possessing the least free binding energy. The clustering figure (Figure 7) shows that the least binding energy (−8.72 kcal/mol) and the best clustering (112 out of 200) are from the docking toward pocket 3 (Figure 6b). Analysis of the best docked complexes reveals that domain III and IV both participated in the GDP recognition. Five critical residues (S249, K252, R256, Y550, E649) that form direct interactions with GDP. The guanine moiety is fixed by forming hydrogen bonds with the main chain atoms of Y550 and E649 (Figure 6c). The side chains of E649 and S249 directly engage in ribose interactions, and basic residues R256/K252 may interact to form salt bridges with the phosphate group. Additionally, residues Y327, Y331, P595 and H597 that encompass the ligand-binding sites may offer ligand selectivity as well. As the nucleotides are structurally diverse, we also analyzed the binding mechanism of different nucleotides (ADP, UDP, CDP) via docking. According to the docking results with the lowest binding energy (Table 6), we can find that most interactions are conserved in different nucleotides binding, especially residues S249, R256 and P595 (Figure 6d-f). To explore whether these residues are conserved in other glycosidase, several GH20 enzymes with available structures (SpHex, VhGlcNAcase, BbLNBase, SmGH20A, and the PDB code is 1M01, 6K35, 4H04, 1QBB respectively) were structurally aligned with Am2301. 10,12,19,[44][45][46] It is obvious that the residues involved in binding pocket are not conserved ( Figure 8), therefore the nucleotide regulation is a unique mechanism of Am2136.
The predicted nucleotide-binding site is similar to the ion-binding site observed in the previously published Am2136 structure (PDB code: 6JQF). 20 We further demonstrated that, except Ca 2+ and Mg 2+ , certain divalent metal ions including Cu 2+ , Zn 2+ and Co 2+ have significant inhibitory effects on Am2136 (Table 7). However, with or without Ca 2+ or Mg 2+ , they did not exhibit obvious behavioral effect on GDP binding to Am2136 ( Table 8), suggesting that the binding of nucleotides on this site is more specific than metal ions. The binding site and regulatory effects of allosteric effector are distinct from both substrate and product. Different allosteric features have been investigated toward nucleotide activators. Global re-arrangements including conformational changes both within the catalytic and regulatory modules are quintessential properties of CTPregulated aspartate carbamoyltransferase. 47 While for sterile alpha motif and HD domain-containing protein 1 (SAMHD1), additional nucleotides at allosteric sites would stabilize inter-monomer interactions thus support tetramer assembly. 48 According to molecule docking calculations, the most probable nucleotide-binding site of Am2136 is found in the cleft between domain III and IV, located at more than 28.6 Å from the active site. This result is coincidentally in resonance with the observation that domain III-domain IV interaction is indispensable for the efficient hydrolysis activity, providing a possible mechanism to couple the allosteric effect with the interdomain associations.

Biochemical and mutational analysis of the putative GDP binding sites
In order to experimentally validate the silico studies on GDP-Am2136 interactions, single and double Ala substitutions were constructed in Y550, E649, R256 and P595/H597, and a mutation of a neighboring basic residue R592A was considered as a negative control. Y-to-F double swap was made on Y327/Y331 to define the roles of tyrosine in phosphate group recognition.
For enzymatic activity assay without GDP, the above mutants all retain the same kinetics profile as the wildtype while only Y550A and P595A/H597A   (Table 9). In contrast, the positive allosteric GDP-modulation effects on substrate binding were largely impaired when mutations were introduced at those predicted GDP-anchoring residues (Table 10). In the presence of 1 mM GDP, the K m values of variant E649A, Y550A, Y327F/Y331F, and R256A were significantly less reduced than that of Am2136-WT. Comparing the catalytic efficiencies (k cat /K m ) with and without GDP, the P595A/H597A showed a roughly two-fold increase, which is the same as the WT and control group R592A, indicating that those two sites are not the determinants for GDP binding.
Intrinsic fluorescent spectroscopy measurements were carried out to further correlate the functional effects of these mutations on GDP-protein binding. As shown in Table 11, comparing with the WT and R592A, the GDP binding affinity of E649A was largely reduced (K D values increased from 116 μM    to 236 μM). Thus, E649 plays a vital role for GDP recognition. Various degrees of weakened GDP binding were also observed in Y550A, Y327F/ Y331F, and R256A. In agreement with the kinetic analysis, GDP binding is less sensitive to P595A/ H597A mutation. Given the consistency of activity assay and GDP binding measurements, it is clear that the kinetics and regulatory properties of GDP are sensitive to residues Y550, E649, R256, Y327 and Y331, while P595 and H597 are partly involved in GDP recognition. Various degrees of weakened GDP binding were also observed in Y550A, Y327F/ Y331F, and R256A. In agreement with the kinetic analysis, GDP binding is less sensitive to P595A/H597A mutation. Given the consistency of activity assay and GDP binding measurements, it is clear that the kinetics and regulatory properties of GDP are sensitive to residues Y550, E649, R256, Y327 and Y331, while P595 and H597 are partly involved in GDP recognition.
Structural data guided mutagenesis and kinetic studies are crucial for elucidating the synergy between regulatory and active sites. According to the structural and mutagenesis studies on SAMHD1, 48 changes of residues that interacting with effector dGTP can significantly decrease the turnover rates, confirming the allosteric mechanism of activation through dGTP-promoted tetramerization and thereby inducing correct active site formation. From our results, any changes in nucleotide-binding pocket of Am2136 would result in decreased but not abolished allosteric effects, indicating a certain conformational tolerance in this binding site. Additionally, it is possible that the nucleotides binding provokes the conformational changes, and the allosteric activation resulted in an increase of the affinity between enzyme and substrate. Although the allosteric effect of nucleotide is not the "turn-on switch" for Am2136, the observed ligand specificity and "efficient activator" profile toward Am2136 suggest a possibility to identify or engineer novel and better performing β-N-acetylhexosaminidases.

Conclusion
In this work, we purified the enzyme Am2136 and explored its potential broad carbohydrates cleavage capability. Intriguingly, we found that Am2136 had nucleotide-binding ability. Functional characterization demonstrated that the substrate binding affinity   and hence enzymatic efficiency of Am2136 could be significantly improved by the addition of the di/triphosphate nucleotides. Therefore, Am2136 is the first reported glycosidase that exhibits allosteric activation in the presence of these nucleotide effectors. We further determined the crystal structure of Am2136 in apo form by the molecular replacement method and refined to 2.9 Å resolution. Specifically, interdomain contacts between domain III and IV are crucial for its β-N-acetylhexosaminidase activity. Subsequently, molecular docking simulation and sitedirected mutagenesis studies further revealed the critical amino acids involved in nucleoside effector recognition and regulation. The nucleotide-assisted activation property does not necessarily make Am2136 more efficient than other homologs (the highest enzymatic efficiency (k cat /K m ) of GH20 β-N-acetylhexosaminidase listed in BRENDA database could reach up to 3428 mM −1 s −1 49,50 ), but makes it more flexible in regulation. Collectively, these results provided new insights into the mechanisms underlying mucin-degrading by microbial β-N-acetylhexosaminidase, and a useful basis for the elucidation of the possible biological significance of Am2136 in the functional process of A. muciniphila. Additionally, they also indicated that the further exploration on the novel adjustment of glycoside hydrolase will contribute to the β-N-acetylhexosaminidase utilization.